Enhanced Image Search by Deducing User Search Goals using Query Logs

نویسنده

  • T. Poongothai
چکیده

In web search applications, queries are submitted to search engines to represent the information needs of users. However, sometimes queries may not exactly represent users specific information needs since many ambiguous queries may cover a broad topic and different users may want to get information on different aspects when they submit the same query. To define user search goals as the information on different aspects of a query that user groups want to obtain. Information need is a user’s particular desire to obtain information to satisfy his/her need. User search goals can be considered as the clusters of information needs for a query. To discover the number of diverse user search goals for a query and depicting each goal with some keywords. First, a novel approach is proposed to infer user image search goals for a query by clustering feedback sessions. The feedback session is defined as the series of both clicked and un clicked URLs and ends with the last URL that was clicked in a session from user click-through logs. Then a novel optimization method is proposed to map the feedback sessions to pseudo-documents which can efficiently reflect user information needs. Finally, these pseudodocuments are clustered to infer user image search goals and weighing factor is used to enhance web search. Keywords-Feedback Session, Logs, Pseudo-documents, Image restructuring I.INTRODUCTION In Websearch applications, users submit queries (i.e., some keywords) to search engines to represent their search goals. However, in many cases, queries may not exactly represent what they want since the keywords may be polysemous or cover a broad topic and users tend to formulate short queries rather than to take the trouble of constructing long and carefully stated ones. Besides, even for the same query, users may have different search goals. We find that users have different search goals for the same query due to the following three reasons. Inferring user search goals is very important in improving search-engine relevance and user experience. Normally, the captured user image-search goals can be utilized in many applications. For example, we can take user image-search goals as visual query suggestions to help users reformulate their queries during image search. Besides, we can also categorize search results for image search according to the inferred user image-search goals to make it easier for users to browse. Furthermore, we can also diversify and rerank the results retrieved for a query in image search with the discovered user image-search goals. Thus, inferring user image-search goals is one of the key techniques in improving users’ search experience. However, although there has been much research for text search, few methods were proposed to infer user search goals in image search. Some works try to discover user image-search goals based on textual information (e.g., external texts including the file name of the image file, the URL of the image, the title of the web page that contains that image and the surrounding texts in image search results and the tags given by users. However, since external texts are not always reliable (i.e., not guaranteed to precisely describe the image contents) and tags are not always available (i.e., the images may not have corresponding tags that need to be intentionally created by users), these textual information based methods still have limitations.It should be possible to infer user image-search goals with the visual information of images (i.e., image features) since different image-search goals usually have particular visual patterns to be distinguished from each other. However, since there are semantic gaps between the existing image features and the image semantics, inferring user image-search goals by visual information is still a big challenge.Therefore, in this paper, we propose to introduce additional information sources to help narrow these semantic gaps. II. LITERATURE SURVEY U. Lee, Z. Liu, and J. Cho [1], proposed automatic identification of user search goals. They stated that majority of queries have a predictable goal. Taxonomy of query goals based on two types: Navigational queries: In this type, user has a particular web page in mind and is primarily interested in visiting that web page. User may either have visited that site before, or just assumes such a site exists. Here, user will only visit the correct sites. Informational queries: These are the queries where user does not have a particular page in mind or intends to visit multiple pages to learn about the topic. User is exploring WebPages that provide background knowledge about a particular query topic. Users click on multiple results because they do not assume a particular website to be single correct answer. Here, two features are used for the prediction of user goal: 1. Past user-click behavior: If a query is navigational, users will primarily click on the result that the user has in mind. Therefore, by Observing the past user-click behavior on the query, we can identify the goal. 2. Anchor-link distribution: If users associate particular query with a particular website then most of the links that contain the anchor will point to that particular website. Hence by observing the destinations of the links with the query keyword as the anchor, we can identify the potential goal of the query. Page291 Limitations: User queries are taken from the CS department that may show technical bias and are well crafted. In short, queries given by CS students are potentially work related. So, if we consider user queries by general people characteristics observed may not be true. X. Wang and C.-X Zhai[2], proposed clustering of search results which organizes it and allows a user to navigate into relevant documents quickly. This approach organizes search results learned from search engine logs. Steps of this approach are as follows: Given a query, 1. Get its related information from search engine logs. Working set is formed by using this information.2. Learn the aspects from information in the working set. These aspects correspond to users interests.3. Each aspect is labeled with representative query. 4. Categorize and organize the search results of the input query according to the aspects. First we will find related past queries in our preprocessed history data collection. Next learn the aspects by clustering. And finally categorize the search results using categorization algorithm. H.-J Zeng, Q.-C He, Z. Chen, W.-Y Ma, and J. Ma [3], researched on reformalizing the clustering problem. This approach consists of four steps: 1. Search result fetching 2. Document parsing and phrase property calculation 3. Salient phrase ranking 4. Post-processing. Given a query and ranked list of search results. Firstly, the whole list of titles and snippets is parsed, extracts all possible phrases from the contents and calculates several properties for each phrase such as document frequencies, phrase frequencies. Then the regression model is applied to combine these properties into a single salience score. Phrases are ranked according to salience score and the top ranked phrases are taken as salient phrases. In post processing, filter out the pure stop words Disadvantages: Feedbacks are not considered. So, noisy results that are not clicked by user may be analysed. R. Jones and K.L. Klinkner [4], defined session boundaries and automatic hierarchical segmentation of search topics. In this approach, analysis of typical timeouts used to divide query streams into sessions and the hierarchical analysis of user search tasks into short term goal and long-term missions is done. Timeout is nothing but elapsed time of 30 minutes between queries which signifies that the user has discontinued searching. Here, combination of diverse set of syntactic, temporal, query log and web search features can predict mission boundaries and goals. Hence, best approach to clustering queries within the same goal may build on first identifying the boundaries then matching subsequent queries to existing segments. Disadvantages:It only identifies whether a pair of queries belong to the same goal or mission but does not care about what the goal is in detail. Wangmeng Zuo, Lei Zhang and Chunwei Song [5] uses natural image statistics in denoising image and propose a texture enhanced image denoising method by enforcing the gradient histogram of the denoised image to be close to a reference gradient histogram of the original image. This idea has to be considered for effectiving searching and also LaiKuan Wong; Kok-Lim Low [6] proposed 5-fold crossvalidation (5-CV) classification accuracy for enhancing image quality there by improving classifications in web sites. Jayaratne, Kithangodage Lakshman [7] proposed an image retrieval system that captures semantics of an image through effective use of its associated text and use integrated system architecture for keyword-based retrieval with low-level image features to enhance retrieval of images on the web. The system was based on an enhanced image representation that exploits the vast power of image semantics from the text associated with the images and higher-level semantic categories based on low-level image features of the images. The user-interface was designed to allow the user to communicate keywords based query and semantic categories to the image retrieval system. They confirmed that the integration of text associated with an image and low-level image features will lead to efficient retrieval system for content-based indexing of images on the web and will in fact substantially enhance the image searching capabilities on the web. [7] shows highly efficient search methods in retrieving image. III. MODULE DESCRIPTION Fig.1 System Architecture

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Compressive Survey on Restructuring User Search Results by Using Feedback Session

this internet search engine relevance may be enhanced by means of considering end user search goal. In addition to the individual search engine optimization experience is usually increased through inferring individual search goals. This paper proposes a novel approach to infer user search goals by analyzing search engine query logs known as feedback session. First framework is proposed to disco...

متن کامل

مرور مؤثر نتایج جستجوی تصاویر با تلخیص بصری و متنوع از طریق خوشه‌بندی

With unprecedented growth in production of digital images and use of multimedia references, requirement of image and subject search has been increased. Systematic processing of this information is a basic prerequisite for effective analysis, organization and management of it. Likewise, large collections of images have been made available on the Web and many search engines have provided the poss...

متن کامل

An Efficient Approach for Retrieving Personalized User Goals

The most important facility of a search engine is to retrieve relevant information as early as possible. For this understanding user search goal is necessary. For the same query the user information need may be different. In this paper we propose a method to infer user search goal by analyzing search engine query logs. User search goals are discovered by clustering proposed feedback sessions. F...

متن کامل

Acquiring Knowledge About Explicit User Goals from Search Query Logs

Access to knowledge about user goals represents a critical component for realizing the vision of intelligent agents acting upon user intent on the web. Yet, the acquisition of knowledge about user goals represents a major challenge. In a departure from existing approaches, this paper proposes a novel perspective for knowledge acquisition: The utilization of search query logs for this task. The ...

متن کامل

Different Degrees of Explicitness in Intentional Artifacts: Studying User Goals in a Large Search Query Log

On the web, search engines represent a primary instrument through which users exercise their intent. Understanding the specific goals users express in search queries could improve our theoretical knowledge about strategies for search goal formulation and search behavior, and could equip search engine providers with better descriptions of users’ information needs. However, the degree to which go...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015